How to download data (Example: ERA5 from Climate Data Store)

Interpolating/Downscaling Dataset

Authors:

Abdullah A. Fahad (a.fahad@nasa.gov) \ Rifat Bhuiyan \ Torikul Islam Sanjid (torikul.sanjid@gmail.com) \ Tahmidul Azom Sany (tsany@gmu.edu)

Instead of manually selecting and submitting multiple request you can use cdsapi.

Prerequisite:

  1. Create an account on Climate Data Store
  2. Go here and copy the url and key.
  3. create a dot file (.cdsapi) and place it on your root directory.(If you are not sure about the location just run cdsapi.Client() and the error message will tell you the location)

More help

In [ ]:
#Install the CDS API client
!pip install cdsapi
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting cdsapi
  Downloading cdsapi-0.6.1.tar.gz (13 kB)
  Preparing metadata (setup.py) ... done
Requirement already satisfied: requests>=2.5.0 in /usr/local/lib/python3.10/dist-packages (from cdsapi) (2.27.1)
Requirement already satisfied: tqdm in /usr/local/lib/python3.10/dist-packages (from cdsapi) (4.65.0)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests>=2.5.0->cdsapi) (1.26.15)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests>=2.5.0->cdsapi) (2022.12.7)
Requirement already satisfied: charset-normalizer~=2.0.0 in /usr/local/lib/python3.10/dist-packages (from requests>=2.5.0->cdsapi) (2.0.12)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests>=2.5.0->cdsapi) (3.4)
Building wheels for collected packages: cdsapi
  Building wheel for cdsapi (setup.py) ... done
  Created wheel for cdsapi: filename=cdsapi-0.6.1-py2.py3-none-any.whl size=12009 sha256=6bb2b97c0a0e19b499c300eb9364809e5cfa0fde85934ccb59586e2b96af41fc
  Stored in directory: /root/.cache/pip/wheels/7c/63/08/45461d6f6636c1aba7846828d8c787a064073945048f76d44a
Successfully built cdsapi
Installing collected packages: cdsapi
Successfully installed cdsapi-0.6.1
In [ ]:
import cdsapi

Sending request

The following cell sends requests to download the variables "relative_humidity, specific_humidity, u_component_of_wind, v_component_of_wind", at three pressure levels (150hPa, 500hPa, 850hPa) of all 24 hours everyday of 1981 to 1984 from the region at longitude 79 to 101 and latitude 25 to 0 in netcdf format.

In [ ]:
#Example:
#you can also get the templet code from "Show API request" option after selecting data on your desired dataset
#remove download.nc from the templet. Also for the loop to work I replaced year value with loop counter.
request_id = [] #Each request has an unique id to be used later when dowloading
for i in range(1981,1985):
  year = str(i)
  c = cdsapi.Client(wait_until_complete=False, delete=False)
  r = c.retrieve(
      'reanalysis-era5-pressure-levels',   #Name of the dataset
      {
          'product_type': 'reanalysis',    #All the desired variables and coordinates
          'variable': [
              'relative_humidity', 'specific_humidity',
              'u_component_of_wind', 'v_component_of_wind',
          ],
          'pressure_level': [
              '150', '500', '850',
          ],
          'year': year,
          'month': ['01', '02', '03', '04', '05', '06', '07', '08', '09', '10', '11', '12'],
          'day': [
              '01', '02', '03',
              '04', '05', '06',
              '07', '08', '09',
              '10', '11', '12',
              '13', '14', '15',
              '16', '17', '18',
              '19', '20', '21',
              '22', '23', '24',
              '25', '26', '27',
              '28', '29', '30',
              '31',
          ],
          'time': [
              '00:00', '01:00', '02:00',
              '03:00', '04:00', '05:00',
              '06:00', '07:00', '08:00',
              '09:00', '10:00', '11:00',
              '12:00', '13:00', '14:00',
              '15:00', '16:00', '17:00',
              '18:00', '19:00', '20:00',
              '21:00', '22:00', '23:00',
          ],
          'area': [                            #[North, west, south, east]
              25, 79, 0,
              101,
          ],
          'format': 'netcdf',
      },)
  request_id.append(r.reply['request_id'])        #Storing request ID's
2023-05-07 19:50:58,806 INFO Welcome to the CDS
INFO:cdsapi:Welcome to the CDS
2023-05-07 19:50:58,810 INFO Sending request to https://cds.climate.copernicus.eu/api/v2/resources/reanalysis-era5-pressure-levels
INFO:cdsapi:Sending request to https://cds.climate.copernicus.eu/api/v2/resources/reanalysis-era5-pressure-levels
2023-05-07 19:50:59,079 INFO Welcome to the CDS
INFO:cdsapi:Welcome to the CDS
2023-05-07 19:50:59,083 INFO Sending request to https://cds.climate.copernicus.eu/api/v2/resources/reanalysis-era5-pressure-levels
INFO:cdsapi:Sending request to https://cds.climate.copernicus.eu/api/v2/resources/reanalysis-era5-pressure-levels
2023-05-07 19:50:59,522 INFO Welcome to the CDS
INFO:cdsapi:Welcome to the CDS
2023-05-07 19:50:59,525 INFO Sending request to https://cds.climate.copernicus.eu/api/v2/resources/reanalysis-era5-pressure-levels
INFO:cdsapi:Sending request to https://cds.climate.copernicus.eu/api/v2/resources/reanalysis-era5-pressure-levels
2023-05-07 19:50:59,793 INFO Welcome to the CDS
INFO:cdsapi:Welcome to the CDS
2023-05-07 19:50:59,798 INFO Sending request to https://cds.climate.copernicus.eu/api/v2/resources/reanalysis-era5-pressure-levels
INFO:cdsapi:Sending request to https://cds.climate.copernicus.eu/api/v2/resources/reanalysis-era5-pressure-levels

Saving Request ID's(Important)

Request takes may take long time to complete(Check here to see your request status) To avoid losing request ID's copy and paste them to another cell so they are saved even if you delete runtime.

In [ ]:
request_id
Out[ ]:
['fa0989d9-cea1-4a99-a842-9043912b1398',
 '91b407ec-7986-4c25-861d-1e1ca108b62b',
 '12338118-7efa-4138-b3ee-6d08cca25c0e',
 'ec54ffd4-ba53-4c57-88e0-4a1f961d24aa']
In [ ]:
request_id = ['fa0989d9-cea1-4a99-a842-9043912b1398',
 '91b407ec-7986-4c25-861d-1e1ca108b62b',
 '12338118-7efa-4138-b3ee-6d08cca25c0e',
 'ec54ffd4-ba53-4c57-88e0-4a1f961d24aa']

Downloading data

At a later time when the requests are complete the following cell will sequencially download the files

In [ ]:
#At a later time when requests are completed
for i in range(len(request_id)): # request_id from above
  reply = dict(request_id=request_id[i])
  new_client = cdsapi.Client(wait_until_complete=False, delete=False)
  result = cdsapi.api.Result(new_client, reply)
  result.update()
  reply = result.reply
  if reply['state'] == 'completed':
      result.download('/content/drive/MyDrive/data/'+str(1981+i) +'.nc')      #Specifying download location and file name ##netcdf file extention .nc
2023-05-07 21:52:06,174 INFO Downloading https://download-0000-clone.copernicus-climate.eu/cache-compute-0000/cache/data5/adaptor.mars.internal-1683490616.4269726-6715-4-fa0989d9-cea1-4a99-a842-9043912b1398.nc to /content/drive/MyDrive/data/1981.nc (759.2M)
INFO:cdsapi:Downloading https://download-0000-clone.copernicus-climate.eu/cache-compute-0000/cache/data5/adaptor.mars.internal-1683490616.4269726-6715-4-fa0989d9-cea1-4a99-a842-9043912b1398.nc to /content/drive/MyDrive/data/1981.nc (759.2M)
2023-05-07 21:53:43,294 INFO Download rate 7.8M/s
INFO:cdsapi:Download rate 7.8M/s
2023-05-07 21:53:43,987 INFO Downloading https://download-0018.copernicus-climate.eu/cache-compute-0018/cache/data6/adaptor.mars.internal-1683492215.1673737-20363-18-91b407ec-7986-4c25-861d-1e1ca108b62b.nc to /content/drive/MyDrive/data/1982.nc (759.2M)
INFO:cdsapi:Downloading https://download-0018.copernicus-climate.eu/cache-compute-0018/cache/data6/adaptor.mars.internal-1683492215.1673737-20363-18-91b407ec-7986-4c25-861d-1e1ca108b62b.nc to /content/drive/MyDrive/data/1982.nc (759.2M)
2023-05-07 22:00:33,146 INFO Download rate 1.9M/s
INFO:cdsapi:Download rate 1.9M/s
2023-05-07 22:00:33,890 INFO Downloading https://download-0002-clone.copernicus-climate.eu/cache-compute-0002/cache/data4/adaptor.mars.internal-1683493521.9843454-19941-9-12338118-7efa-4138-b3ee-6d08cca25c0e.nc to /content/drive/MyDrive/data/1983.nc (759.2M)
INFO:cdsapi:Downloading https://download-0002-clone.copernicus-climate.eu/cache-compute-0002/cache/data4/adaptor.mars.internal-1683493521.9843454-19941-9-12338118-7efa-4138-b3ee-6d08cca25c0e.nc to /content/drive/MyDrive/data/1983.nc (759.2M)
2023-05-07 22:01:35,860 INFO Download rate 12.3M/s
INFO:cdsapi:Download rate 12.3M/s
2023-05-07 22:01:36,032 INFO Downloading https://download-0001-clone.copernicus-climate.eu/cache-compute-0001/cache/data8/adaptor.mars.internal-1683495144.6553042-29362-18-ec54ffd4-ba53-4c57-88e0-4a1f961d24aa.nc to /content/drive/MyDrive/data/1984.nc (759.2M)
INFO:cdsapi:Downloading https://download-0001-clone.copernicus-climate.eu/cache-compute-0001/cache/data8/adaptor.mars.internal-1683495144.6553042-29362-18-ec54ffd4-ba53-4c57-88e0-4a1f961d24aa.nc to /content/drive/MyDrive/data/1984.nc (759.2M)
2023-05-07 22:02:47,529 INFO Download rate 10.6M/s
INFO:cdsapi:Download rate 10.6M/s

Interpolating or Downscaling dataset

Interpolation is a technique of estimating the value of a function at a point that is not directly or actually measured by using the konwn data points.

In case of climate physics, interpolation is used to estimate the value of a climate variable such as temperature or precipitation at a location that is not actually measured.This is done by using the values of the climate variable at nearby locations that are known.

Example

Here we'll create a dummy dataarray data which has temperature values over a time axis.The time array represents the time coordinate values and the temperature array contains corresponding temperature values.We suppose this is an observed data array.

Then we define a new time array,new_time where we want to know the values of the temperatures.But we don't have any values at these new_time points in the observed data array.This is the stage where we use interpolation to know the temperature values at these new_time values.

syntax of interpolation:

DataArray.interp(coords=None, method='linear', assume_sorted=False, kwargs=None, **coords_kwargs)

It has the following arguments:

  • coords: Mapping from dimension names to the new coordinates. New coordinate can be a scalar, array-like or DataArray. If DataArrays are passed as new coordinates, their dimensions are used for the broadcasting. Missing values are skipped.
  • method: A string indicating which method to use for interpolation. The following methods are supported:
    • linear: Linear interpolation.
    • nearest: Nearest-neighbor interpolation.
    • zero: Interpolates all values to zero.
    • slinear: 2nd-order spline interpolation with first derivatives at the endpoints set to zero.
    • quadratic: 3rd-order spline interpolation with second derivatives at the endpoints set to zero.
    • cubic: 4th-order spline interpolation with third derivatives at the endpoints set to zero.
  • assume_sorted: A boolean indicating whether the new coordinates are assumed to be sorted. If True, the interpolation is performed more efficiently.
  • kwargs: Additional keyword arguments are passed to the interpolation routine.
  • coords_kwargs: Keyword arguments for the DataArray.sel() method used to select the values to interpolate.

The DataArray.interp() method returns a new DataArray with the interpolated values.

In [ ]:
import numpy as np
import xarray as xr

# Create a sample DataArray with temperature values
time = np.arange(0, 5)
temperature = np.array([20, 25, 22, 18, 15])
data = xr.DataArray(temperature, coords={'time': time}, dims=['time'])

# Define new time values for interpolation
new_time = np.arange(0, 4.5, 0.5)

# Interpolate the temperature DataArray to the new time values using linear interpolation
interpolated_data = data.interp(coords={'time': new_time}, method='linear')

# Print the original and interpolated DataArrays
print("Original DataArray:")
print(data)
print("\nInterpolated DataArray:")
print(interpolated_data)
Original DataArray:
<xarray.DataArray (time: 5)>
array([20, 25, 22, 18, 15])
Coordinates:
  * time     (time) int64 0 1 2 3 4

Interpolated DataArray:
<xarray.DataArray (time: 9)>
array([20. , 22.5, 25. , 23.5, 22. , 20. , 18. , 16.5, 15. ])
Coordinates:
  * time     (time) float64 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0

Important

If we carefully look at the time of original data array and time of the interpolated data array we will see both data range from o to 4, but the intervals are diffrerent. But it is allowed to have the new time values extend beyond the range of the original time values when performing interpolation. In Xarray, the DataArray.interp() method can handle extrapolation, which means it can estimate values outside the range of the original data based on the provided interpolation method.

For example, if you have original time values ranging from 0 to 4 and we define new time values from 0 to 8, the DataArray.interp() method will still perform the interpolation using the defined interpolation method (e.g., linear) for the new time values within the original range (0 to 4), and then it will extrapolate values for the new time values beyond the original range (4 to 8).

However, it's important to note that extrapolation can be less reliable.

Visual difference between original and interpolated data

In [ ]:
import matplotlib.pyplot as plt

# Create a figure with two subplots
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))

# Plot the original data in the first subplot
ax1.plot(data['time'], da, 'bo-', label='Original')
ax1.set_xlabel('Time')
ax1.set_ylabel('Temperature')
ax1.set_title('Originally observed Temperature')

# Plot the interpolated data in the second subplot
ax2.plot(interpolated_data['time'], interpolated_data, 'r.-', label='Interpolated')
ax2.set_xlabel('Time')
ax2.set_ylabel('Temperature')
ax2.set_title('Interpolated Temperature')
ax2.legend()

# Adjust the spacing between subplots
plt.tight_layout()

# Display the plot
plt.show()
In [ ]:
# Spatial interpolation add Global
# Shift griding 0-360 and -180 to +180 LON SHift

Spatial interpolation

In [ ]:
# Run this cell when you are in colab and then restart runtime
# Do not worry about this cell, you can always copy paste

!apt-get install libproj-dev proj-data proj-bin
!apt-get install libgeos-dev
!pip install cython
!pip install cartopy

!apt-get -qq install python-cartopy python3-cartopy
!pip uninstall -y shapely    # cartopy and shapely aren't friends (early 2020)
!pip install shapely --no-binary shapely
In [ ]:
# Let's download data by another method. This could take couple of minutes.
!wget https://downloads.psl.noaa.gov/Datasets/noaa.oisst.v2.highres/sst.mon.mean.nc
--2023-06-23 14:22:19--  https://downloads.psl.noaa.gov/Datasets/noaa.oisst.v2.highres/sst.mon.mean.nc
Resolving downloads.psl.noaa.gov (downloads.psl.noaa.gov)... 140.172.38.86
Connecting to downloads.psl.noaa.gov (downloads.psl.noaa.gov)|140.172.38.86|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2077814021 (1.9G) [application/x-netcdf]
Saving to: ‘sst.mon.mean.nc’

sst.mon.mean.nc     100%[===================>]   1.93G  56.5MB/s    in 37s     

2023-06-23 14:22:56 (53.8 MB/s) - ‘sst.mon.mean.nc’ saved [2077814021/2077814021]

In [ ]:
# Let's install packages

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import xarray as xr

import cartopy.crs as ccrs
In [ ]:
# Read the data using xarray
data = xr.open_dataset('/content/sst.mon.mean.nc').sel(time=slice('1982','2022'))
data
Out[ ]:
<xarray.Dataset>
Dimensions:  (time: 492, lat: 720, lon: 1440)
Coordinates:
  * time     (time) datetime64[ns] 1982-01-01 1982-02-01 ... 2022-12-01
  * lat      (lat) float32 -89.88 -89.62 -89.38 -89.12 ... 89.38 89.62 89.88
  * lon      (lon) float32 0.125 0.375 0.625 0.875 ... 359.1 359.4 359.6 359.9
Data variables:
    sst      (time, lat, lon) float32 ...
Attributes:
    Conventions:    CF-1.5
    title:          NOAA/NCEI 1/4 Degree Daily Optimum Interpolation Sea Surf...
    institution:    NOAA/National Centers for Environmental Information
    source:         NOAA/NCEI https://www.ncei.noaa.gov/data/sea-surface-temp...
    References:     https://www.psl.noaa.gov/data/gridded/data.noaa.oisst.v2....
    dataset_title:  NOAA Daily Optimum Interpolation Sea Surface Temperature
    version:        Version 2.1
    comment:        Reynolds, et al.(2007) Daily High-Resolution-Blended Anal...
In [ ]:
data.sst[0,:,:].plot()
Out[ ]:
<matplotlib.collections.QuadMesh at 0x7f2e246d65f0>
In [ ]:
# Let's explore our original lat and lon value

number_of_lon = len(data.lon.values)
number_of_lat = len(data.lat.values)

original_lon, original_lat = [], []

for i in range(number_of_lon):
    original_lon.append(data.lon.values[i])

for j in range(number_of_lat):
    original_lat.append(data.lat.values[j])

print('Original Longitude')
print(original_lon[0:3], original_lon[-4:-1])
print('logitudinal resolution: ', original_lon[1]-original_lon[0])

print('----------------------------------------------')

print('Original Latitude')
print(original_lat[0:3], original_lat[-4:-1])
print('latitudinal resolution: ', original_lat[1]-original_lat[0])
Original Longitude
[0.125, 0.375, 0.625] [359.125, 359.375, 359.625]
logitudinal resolution:  0.25
----------------------------------------------
Original Latitude
[-89.875, -89.625, -89.375] [89.125, 89.375, 89.625]
latitudinal resolution:  0.25

Now we will change the resolution from 0.25*0.25 to 4*4.

Suggestion

  • 1*1
In [ ]:
# Extract variable to interpolate
var = data['sst']

# Define new grid to interpolate to
lon_new = np.arange(0, 361, 4)
lat_new = np.arange(-90, 91, 4)

# Create new dataset with new grid
data_new = xr.Dataset({'lon': (['lon'], lon_new), 'lat': (['lat'], lat_new)})

# Interpolate variable to new grid
var_new = var.interp(lon=lon_new, lat=lat_new)

# Add interpolated variable to new dataset
data_new['sst'] = var_new
In [ ]:
data_new
Out[ ]:
<xarray.Dataset>
Dimensions:  (lon: 91, lat: 46, time: 492)
Coordinates:
  * lon      (lon) int64 0 4 8 12 16 20 24 28 ... 336 340 344 348 352 356 360
  * lat      (lat) int64 -90 -86 -82 -78 -74 -70 -66 ... 66 70 74 78 82 86 90
  * time     (time) datetime64[ns] 1982-01-01 1982-02-01 ... 2022-12-01
Data variables:
    sst      (time, lat, lon) float64 nan nan nan nan nan ... nan nan nan nan
In [ ]:
data_new.sst[0,:,:].plot()
Out[ ]:
<matplotlib.collections.QuadMesh at 0x7fe8c6d363b0>
In [ ]:
plt.figure(figsize=(10, 6))
plt.contourf(lon_new, lat_new, data_new.sst[0,:,:],levels = np.arange(-30,31),cmap='YlOrRd')
plt.colorbar(label='SST')
plt.title('Sea Surface Temperature')
plt.xlabel('Longitude')
plt.ylabel('Latitude')
plt.show()
In [ ]:
plt.figure(figsize=(10, 6))
plt.contourf(data.lon, data.lat, data.sst[0,:,:] ,levels = np.arange(-50,50, 2),cmap='YlOrRd')
plt.colorbar(label='SST')
plt.title('Sea Surface Temperature')
plt.xlabel('Longitude')
plt.ylabel('Latitude')
plt.show()
In [ ]:
levels = np.arange(-5, 36, 5)
fig = plt.figure(figsize=(10,10))

dataset = [data, data_new]
title = ['SST Original 0.25*0.25', 'SST Interpolated 4*4' ]

for i in range(1, 3):
    ax = plt.subplot(2, 1, i)
    plt.contourf(dataset[i-1].lon, dataset[i-1].lat, dataset[i-1].sst[0,:,:], levels=levels, cmap='YlOrRd')
    plt.title(title[i-1])


#colorbar_setting
plt.subplots_adjust(bottom=0.1, right=0.8, top=0.9)
cax = plt.axes([0.84, 0.1, 0.025, 0.8]) #plt.axes([x_position,y_position,width,height])
plt.colorbar(cax=cax)
Out[ ]:
<matplotlib.colorbar.Colorbar at 0x7f2e247ecca0>
In [ ]:
# Need suggestions on grid shifting
-- 0--360
-- -180 -- +180
# google
In [ ]: